%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
This exercise consists of 3 parts. Finish the first part to get the mark of 3.0. The first 2 parts to get 4.0. Finish all parts to get 5.0.
Part 1: Linear layer¶
1.1) Let us start with a linear regression problem. Consider a linear function with a noise: $y = a*x+b + noise$.
We use this formula to generate $100$ random smaples.
### The number of samples
n = 100
### parameters of the linear function
a = -2
b = 3
1.2) Now, let us generate 100 samples and plot them.
### generate equally spaced x-values
x = np.linspace(-1, 1, n)
### generate y-values (we use a numpy library so we can generate a vector of numbers - y - inline)
y = a * x + b + np.random.normal(scale=0.25, size=n)
plt.scatter(x, y)
<matplotlib.collections.PathCollection at 0x1102e74c0>
1.3) As you may see, the samples are placed - more or less - along a single line.
Now, our aim is to find the best parameters for a linear function
so that such defined model describes the given data in the best possible way. For this reason, we will iteratively search the parameter space and thus update the model. Firstly, we need to define an error function. This function will inform how well (or bad) the instantiated model describes the data. For this reason, we use a mean square error function.
We define a mean square error function as:
$\dfrac{\sum\left(y_i - \widehat{y}_i \right)^2}{n} = MSE,$
where $y$ are the target (i.e., data values) and $\widehat{y}$ are the output (i.e., model's) values.
See the MSE (mean square error) function given below.
def mse(y_target, y_calc):
return ((y_target - y_calc) ** 2).mean()
1.4) Run the below code for different parameters of the model. Which paramter values give the best (i.e., minimal) MSE?
Answer: Best (lowest) MSE was achieved for parameters a = -2 and b = 3 (original parameters)
a_2 = -2
b_2 = 3
y_calc = a_2 * x + b_2
print("MSE = " + str(mse(y, y_calc)))
plt.scatter(x, y, label="target")
plt.scatter(x, y_calc, label="calculated")
plt.legend()
MSE = 0.0656690568744481
<matplotlib.legend.Legend at 0x1104f7a60>
1.5) We want to find the best possible model parameters automatically. For this reason, we use a gradient of a loss function. The gradient informs what is the direction of the fastest increase/decrease of a given function. We use this information to update both model parameters. This procedure will be performed iterativelly. In each iteration, the parameters a and b will be slightly modififed such that MSE will be reduced (i.e., improved).
Firstly, finish the below function. It should calculate a batch gradient of a loss function, i.e., MSE for each point separately (y_target, and y_calc are array, not just scalars, so output also should be array).
def mse_grad(y_target, y_calc):
return y_calc - y_target
### TEST
print(mse_grad(y, y_calc))
[ 0.16435044 0.10530027 0.35953788 0.09790265 -0.20403568 -0.25837372 0.05689676 -0.29803973 -0.53201142 -0.0432235 0.33343447 0.10328341 -0.60409687 0.02027015 0.4736115 -0.21370767 0.21591511 0.40266068 0.56601944 -0.05137797 0.18767104 0.06748695 -0.1879076 -0.20617886 -0.15836553 -0.16336731 0.04617821 -0.00684341 -0.12529731 0.12018349 -0.38429573 0.15270568 0.19339822 0.00229354 -0.25456968 -0.12439183 -0.12556509 -0.15535872 -0.07311001 -0.15435593 0.32578137 0.26337263 0.07163233 0.25972974 -0.09615268 -0.03085522 0.01820603 0.10348768 0.09528317 -0.0145715 -0.02283473 0.02136038 -0.13141972 -0.00436607 0.0168016 -0.34427812 -0.06435793 -0.23591849 -0.34047201 0.40400917 0.35736458 0.02164308 0.08913521 -0.12835901 -0.17615402 -0.22754227 -0.66515678 -0.15496322 0.20720018 -0.49948568 -0.27297432 -0.23819038 -0.10911635 -0.05813818 -0.15376758 -0.13818918 0.00649086 0.58984702 -0.22837654 -0.54988911 0.08655574 0.22054453 0.08136499 -0.44276635 -0.28361487 -0.28359411 -0.51896772 0.1044107 -0.09017026 0.01769262 -0.21311841 -0.491339 0.30520476 0.26076527 0.15847168 0.26379813 -0.24886087 0.30243846 0.10489043 -0.01035905]
1.6) Fill the update function to calculate gradient of parameter $a$ and $b$ basing on a gradient of loss function (grad_y) and input vector (x). Then update the parameter $a$ and $b$ base on their gradients and learning rate (lr). To update parameters use batch gradient descent.
class LinearLayer:
def __init__(self, a, b):
self.a = a
self.b = b
def __call__(self, x):
return self.a * x + self.b
def update(self, x, grad_y, lr):
grad_a = (grad_y * x).mean()
grad_b = grad_y.mean()
self.a -= lr * grad_a
self.b -= lr * grad_b
1.7) Write Step function which calculates: y_calc output of the model base on input x, loss of the model, gradient of loss, and update the model parameters.
def Step(x, y, model, lr):
y_calc = model(x)
loss = mse(y, y_calc)
grad_y = mse_grad(y, y_calc)
model.update(x, grad_y, lr)
return y_calc, loss
1.8) Fit the model for 100 epochs, with learning rate 0.05, and with initial value of parameters a = 1.1, and b = 2.
model = LinearLayer(1.1, 2)
lr = 0.05
epoch = 200
losses = []
for i in range(epoch):
y_calc, loss = Step(x, y, model, lr)
losses.append(loss)
plt.plot(losses)
[<matplotlib.lines.Line2D at 0x14fabeaf0>]
Animation of the learning process
from matplotlib import animation, rc
rc('animation', html='jshtml')
model = LinearLayer(1.1, 2)
fig = plt.figure()
plt.scatter(x, y)
line, = plt.plot(x, y_calc, ".", c="orange")
plt.close()
def animate(i):
y_calc, loss = Step(x, y, model, lr)
line.set_ydata(y_calc)
return (line,)
animation.FuncAnimation(fig, animate, np.arange(0, epoch), interval=20)
1.9) There is an example it can be done in pytorch.
# Imports
import torch
import torch.nn as nn
/Users/Kuba/Library/Python/3.9/lib/python/site-packages/torch/utils/_pytree.py:185: FutureWarning: optree is installed but the version is too old to support PyTorch Dynamo in C++ pytree. C++ pytree support is disabled. Please consider upgrading optree using `python3 -m pip install --upgrade 'optree>=0.13.0'`. warnings.warn(
# Convert numpy array to torch tensor, [:,None] add an additional dimension
xt = torch.FloatTensor(x[:, None])
yt = torch.FloatTensor(y[:, None])
def mse(y_target, y_calc):
return ((y_target - y_calc) ** 2).mean()
class LinearLayer(nn.Module):
def __init__(self, a, b):
super(LinearLayer, self).__init__() # initialize torch functionality
# change a and b to float tensor, and next to parameters,
# the main difference between tensor and parameter is that parameter keeps information about calculations,
# which is used to calculate gradients
self.a = nn.Parameter(torch.FloatTensor([a]).view(1, 1))
self.b = nn.Parameter(torch.FloatTensor([b]))
# forward function is similar to python __call__ but also contain torch functionality
def forward(self, x):
return x @ self.a + self.b # linear equation, @ means matrix multiplication for tensor
def update(self, lr):
with torch.no_grad(): # when we update parameter, we have to switch off gradient tracking
self.a.sub_(lr * self.a.grad) # inplace update of parameter a
self.a.grad.zero_() # clear gradient
self.b.sub_(lr * self.b.grad)
self.b.grad.zero_()
model = LinearLayer(-1.1, 0.2)
def torchStep(x, y, model, lr):
y_calc = model(x) # calculate the output of our model
loss = mse(y, y_calc) # calculate the loss
loss.backward() # calculate all gradients
model.update(lr) # update parameters
return loss, y_calc
loss, y_calc = torchStep(xt, yt, model, lr)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()
def animate(i):
loss, y_calc = torchStep(xt, yt, model, lr)
y_calc = y_calc.detach().cpu() #
line.set_ydata(y_calc)
return (line,)
animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
# we can use optimizer to update parameters base on their gradients
# the most simple is stochastic gradient descent (SGD)
def torchStep2(x, y, model, optim):
optim.zero_grad() # clear gradients
y_calc = model(x) # calculate output of model
loss = mse(y, y_calc) # calculate loss
loss.backward() # calculate all gradients
optim.step() # make a optymalizer step which update parameters
return loss, y_calc
model = LinearLayer(-1.1, 0.2)
optim = torch.optim.SGD(model.parameters(), lr)
loss, y_calc = torchStep2(xt, yt, model, optim)
y_calc = y_calc.detach().cpu()
fig = plt.figure()
plt.scatter(xt[:, 0], yt)
line, = plt.plot(xt[:, 0], y_calc, c="orange")
plt.close()
def animate(i):
loss, y_calc = torchStep2(xt, yt, model, optim)
y_calc = y_calc.detach().cpu()
line.set_ydata(y_calc)
return (line,)
animation.FuncAnimation(fig, animate, np.arange(0, 100), interval=20)
Part 2: Convolution layer¶
# input image
image = np.array(
[
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0],
[1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0],
]
)
plt.imshow(image)
<matplotlib.image.AxesImage at 0x14fc3a190>
2.1) Write a function which calculates a convolution on an input matrix (image) using kernel (mask) with shape 3x3 and bias. Do not use padding, so the output image should be in size: (input_width -2) x (input_height -2).
from scipy.signal import convolve2d
def Convolution(image, kernel, bias):
img_out = np.zeros((image.shape[0] - 2, image.shape[1] - 2))
for i in range(image.shape[0]-2):
for j in range(image.shape[1]-2):
subarray = image[i:i+3, j:j+3]
img_out[i][j] = np.sum(subarray * kernel) + bias
# img_out = convolve2d(image, kernel, mode='valid')
return img_out
# kernel (mask) which is mean filter
kernel = np.ones((3, 3)) / 9
kernel
array([[0.11111111, 0.11111111, 0.11111111],
[0.11111111, 0.11111111, 0.11111111],
[0.11111111, 0.11111111, 0.11111111]])
bias = -0.5
img_out = Convolution(image, kernel, bias)
plt.imshow(img_out)
<matplotlib.image.AxesImage at 0x31b7a7310>
2.2) Find out kernels (masks) which found horizontal and vertical lines. Pixels belonging to the line should be greater than zero and the others less than zero. Use size 3x3 masks.
Example
print(Convolution(np.array([[0,0,0,0,0],[0,0,0,0,0],[1,1,1,1,1],[0,0,0,0,0],[0,0,0,0,0]]), kernel_horizontal, -2))
[[-1. -1. -1.]
[ 1. 1. 1.]
[-1. -1. -1.]]
kernel_horizontal =np.array([[1, 1, 1], [0, 0, 0], [-1, -1, -1]])
img_horizontal = Convolution(image, kernel_horizontal, -2)
plt.imshow(img_horizontal)
<matplotlib.image.AxesImage at 0x31bf24e20>
kernel_vertical = kernel_horizontal.T
img_vertical = Convolution(image, kernel_vertical, -2)
plt.imshow(img_vertical)
<matplotlib.image.AxesImage at 0x31c011280>
2.3) Complete function to calculate ReLU.
def relu(x):
return np.maximum(0, x)
2.4) Find bias values such that output images pixels have a value above 0 only if original pixel is a part of the horizontal/vertical line.
plt.imshow(relu(img_horizontal))
plt.show()
plt.imshow(relu(img_vertical))
<matplotlib.image.AxesImage at 0x31c3b4310>
Part 3: Deep network¶
import pandas as pd
# load iris dataset
df = pd.read_csv('iris.csv')
# n - number of elements in dataset
n = len(df)
# useful variables
feature_columns = ["sepal.length", "sepal.width", "petal.length", "petal.width"]
target_column = "variety"
class_number = 3
feature_number = 4
# dictionaries use to map class name to number
name_to_class = {0: "Setosa", 1: "Versicolor", 2: "Virginica"}
class_to_name = {"Setosa": 0, "Versicolor": 1, "Virginica": 2}
# conversion of class name
df[target_column] = df[target_column].apply(lambda x: class_to_name[x])
# take raw numpy data
x = df[feature_columns].values
y = df[target_column].values
# normalize data to make network input mean value equals 0 and standard deviation 1
x = (x - x.mean(0)) / x.std(0)
print(x.mean(0))
print(x.std(0))
[-4.73695157e-16 -7.81597009e-16 -4.26325641e-16 -4.73695157e-16] [1. 1. 1. 1.]
# conversion numpy array to torch tensor
x = torch.FloatTensor(x)
y = torch.LongTensor(y)
# simple neural network with one hidden layer with hidden_nr neuron
# input_layer calculate some features which are used by hidden_layer to calculate prediction
# between input_layer and hidden_layer there is relu as a nonlinear activation function
# after hidden_layer there is sigmoid function because we want the network to return the result as a probability of each class in range [0,1]
class Net(nn.Module):
def __init__(self, input_nr, hidden_nr, output_nr):
super(Net, self).__init__()
self.input_layer = nn.Linear(input_nr, hidden_nr)
self.hidden_layer = nn.Linear(hidden_nr, output_nr)
def forward(self, x):
x = self.input_layer(x)
x = torch.relu(x)
x = self.hidden_layer(x)
return torch.sigmoid(x)
Cross entropy loss is equal $- (y=0) * log(p_0) - (y=1) * log(p_1) - (y=2) * log(p_2)$ where $p_1, p_2,p_3$ are calculated probability of class 1,2,3; and y=0 means y is classified to class 0.
loss_func = nn.CrossEntropyLoss()
# accuracy means how many samples are classified correctly
def Accuracy(y_target, y_calc):
prediction_class = y_calc.max(1)[1]
number_of_correct = (prediction_class == y).float().sum()
return number_of_correct / n
def Step(x, y, model, optim):
optim.zero_grad()
y_calc = model(x)
loss = loss_func(y_calc, y)
loss.backward()
optim.step()
acc = Accuracy(y, y_calc)
return loss, y_calc, acc
# Train function train model for epoch step, and collect metrics (loss and accuracy)
def Train(x, y, model, optim, epoch):
losses = []
accuracies = []
for i in range(epoch):
loss, y_calc, acc = Step(x, y, model, optim)
losses.append(loss.detach().numpy())
accuracies.append(acc)
return losses, accuracies
lr = 0.1
# create a model and optimalizer
hidden_nr = 5
model = Net(feature_number, hidden_nr, class_number)
optim = torch.optim.SGD(model.parameters(), lr)
epoch = 200
losses, accuracies = Train(x, y, model, optim, epoch)
plt.plot(losses)
plt.show()
plt.plot(accuracies)
[<matplotlib.lines.Line2D at 0x34af1d220>]
Part 3:¶
3.1) Create a report of testing different values of learning rate, and number of neurons in hidden layer; Run every test 10 times with 200 epochs. Make a plot of mean of losses and accuracy of each value in the test case.
test case 1:
learning rate:[ 1, 0.5, 0.1, 0.01, 0.001]
number of neuron in hidden layer: 10
test case 2:
number of neuron in hidden layer: [1, 2, 5, 10, 20, 100]
learning rate: 0.1
Testing different learning rates¶
hidden_nr = 10
losses_arr = []
accuracies_arr = []
for lr in [1, 0.5, 0.1, 0.01, 0.001]:
ls = []
acc = []
for _ in range(10):
model = Net(feature_number, hidden_nr, class_number)
optim = torch.optim.SGD(model.parameters(), lr)
losses, accuracies = Train(x, y, model, optim, 200)
ls.append(losses[-1])
acc.append(accuracies[-1])
losses_arr.append(sum(ls)/10)
accuracies_arr.append(sum(acc)/10)
plt.plot([0.001, 0.01, 0.1, 0.5, 1], losses_arr[::-1], '-o')
plt.xscale('log')
plt.xlabel('Value of learning rate')
plt.ylabel('Average loss')
plt.title('Loss depending on value of learning rate')
plt.show()
plt.plot([0.001, 0.01, 0.1, 0.5, 1], accuracies_arr[::-1], '-o')
plt.xscale('log')
plt.xlabel('Value of learning rate')
plt.ylabel('Average accuracy')
plt.title('Accuracy depending on value of learning rate')
plt.show()
Testing different number of neurons¶
lr = 0.1
losses_arr = []
accuracies_arr = []
for nr_of_neurons in [1, 2, 5, 10, 20, 100]:
ls = []
acc = []
for _ in range(10):
model = Net(feature_number, nr_of_neurons, class_number)
optim = torch.optim.SGD(model.parameters(), lr)
losses, accuracies = Train(x, y, model, optim, 200)
ls.append(losses[-1])
acc.append(accuracies[-1])
losses_arr.append(sum(ls)/10)
accuracies_arr.append(sum(acc)/10)
plt.plot([1, 2, 5, 10, 20, 100], losses_arr, '-o')
plt.xscale('log')
plt.xlabel('Number of neurons in hidden layer')
plt.ylabel('Average loss')
plt.title('Loss depending on number of neurons in hidden layer')
plt.show()
plt.plot([1, 2, 5, 10, 20, 100], accuracies_arr, '-o')
plt.xscale('log')
plt.xlabel('Number of neurons in hidden layer')
plt.ylabel('Average accuracy')
plt.title('Accuracy depending on number of neurons in hidden layer')
plt.show()